Skip to content

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Jan 7, 2026

Fixes #4888

Stacked on #5766

I used to work on #5082 for the fix, but I hit too many blockers, because this PR could interact with many new assumptions/hacks/unfinalized designs on things like allocation domain, stream-sharded tensor, multidevice, etc., and we keep having new things committed to the main branch that break #5082. This situation delayed the PR for a very long time. So I recreated this PR that is more friendly to incremental development.

Today, in the main branch, in FusionExecutorCache, we were assuming fusion segments always generate contiguous tensors. This is not true for ExpressionEvaluator segments. For example, ATen's slice op returns non-contiguous tensors. It is worth mentioning that, because segmentation and scheduler selection depend on inputs, the contiguity of intermediate results also depends on inputs.

This PR adds FusionKernelRuntime::inferOutputMetaTensor(, which replaces inferOutputShapeAndContiguousStrides to infer the output shape and stride of each segment. Both FusionKernelRuntime::inferOutputMetaTensor( and inferOutputShapeAndContiguousStrides store their result as a tensor on the meta device. The difference is, FusionKernelRuntime::inferOutputMetaTensor( will actually run the segment on device type meta if this segment is scheduled to run by ExpressionEvaluator, while inferOutputShapeAndContiguousStrides just assumes the output to be contiguous.

Because FusionKernelRuntime::inferOutputMetaTensor( will run the segment on device type meta, related op's MyOp::evaluate should work for device type meta. There is good and bad news for this design. The good news is, most MyOp::evaluate just calls at:: ops, which usually already support meta device, and PyTorch designed meta device to try to make its behavior on par with CUDA. The bad news is, because many op's meta device implementation is on Python, running at::op on these kinds of ops would hang due to the inability to grab Python's GIL (Thanks @naoyam for help debugging!). If this is the case, the corresponding MyOp::evaluate must manually compute the shape and stride and use at::empty_strided(device=meta) to create the result.

Besides FusionKernelRuntime::inferOutputMetaTensor(, this PR also adds FusionKernelRuntime::updateContiguityOfSegmentOutputs(. Which updates the segment output TensorViews' contiguity based on the inferred shape and stride.

This PR adds an enable option "infer-contiguity" to incrementally enable this feature. When "infer-contiguity" is disabled, FusionKernelRuntime::inferOutputMetaTensor( will fallback to the behavior of inferOutputShapeAndContiguousStrides, and FusionKernelRuntime::updateContiguityOfSegmentOutputs( will be no-op. The plan is, we merge this PR and not set "infer-contiguity" for the currently failed tests. I will write new PRs fixing the failed tests one by one.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Review updated until commit c81f895

Description

  • Add new InferContiguity option to control contiguity inference behavior

  • Replace inferOutputShapeAndContiguousStrides with inferContiguousOutputMetaTensor for clarity

  • Introduce FusionKernelRuntime::inferOutputMetaTensor() to run expr-eval segments on meta device for accurate contiguity

  • Add updateContiguityOfSegmentOutputs() to update contiguity info based on runtime behavior

  • Fix matmul evaluation to conditionally use old contiguous assumption when option disabled

  • Update test configurations to set/unset InferContiguity option appropriately

  • Add regression test for issue FusionKernelRuntime::getMaybeHeuristicsFor computes the wrong strides.  #4888 to verify contiguity inference works correctly

Changes walkthrough

Relevant files
Configuration changes
2 files
options.cpp
Add InferContiguity option to available options                   
+1/-0     
options.h
Define InferContiguity enable option enum                               
+1/-0     
Enhancement
6 files
allocations.cpp
Rename function to inferContiguousOutputMetaTensor             
+2/-2     
allocations.h
Update function signature for renamed allocation function
+1/-1     
fusion_kernel_runtime.cpp
Add inferOutputMetaTensor and updateContiguityOfSegmentOutputs methods
+68/-8   
fusion_kernel_runtime.h
Declare new methods for meta tensor inference and contiguity updates
+21/-0   
conftest.py
Add enable/disable options support to exec_nvfuser             
+11/-1   
utils.py
Add options support to check_captured_python_definition   
+18/-2   
Bug fix
1 files
composite_nodes.cpp
Update matmul evaluation to use InferContiguity option     
+12/-8   
Miscellaneous
1 files
fusion_cache_utils.cpp
Add missing include for ir_utils                                                 
+1/-0     
Tests
12 files
test_alias.cpp
Disable InferContiguity option for alias test                       
+3/-0     
test_indexing_advanced.cpp
Enable InferContiguity option for advanced indexing tests
+2/-0     
test_layout_op.cpp
Disable InferContiguity option for layout op test               
+1/-0     
test_loop_domain_scheduling.cpp
Enable InferContiguity option for loop domain scheduling test
+1/-0     
test_low_precision_recipe.cpp
Disable InferContiguity option for block quantization test
+7/-1     
test_matmul_aten_evaluation.cpp
Remove matmul output strides test                                               
+0/-33   
test_matmul_scheduler.cpp
Enable InferContiguity option for matmul scheduler tests 
+1/-0     
test_pointwise.cpp
Enable InferContiguity option for pointwise tests               
+1/-0     
test_rng.cpp
Enable InferContiguity option for RNG tests                           
+1/-0     
test_segmentation.cpp
Update expected upcast ops count in segmentation test       
+4/-1     
utils.cpp
Enable InferContiguity option in NVFuserTest setup             
+1/-0     
test_python_frontend.py
Add test_issue4888 for contiguity inference regression test
+98/-0   

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Backward Compatibility

The new InferContiguity option changes the behavior of MatmulOp::evaluate(). When disabled, it uses the old logic that assumes contiguous outputs. When enabled, it uses the new logic that infers actual contiguity. This could potentially break existing code that depends on the old behavior. The PR should document this breaking change clearly and provide migration guidance.

// Without InferContiguity, we mistakenly assume the output is contiguous.
if (!isOptionEnabled(EnableOption::InferContiguity)) {
  const auto& [sizes, strides] = inferShapeAndContiguousStrides(out(), ee);
  auto meta_out = at::detail::empty_strided_meta(sizes, strides, a.dtype());

  if (meta_out.is_contiguous()) {
    return {matmul_out};
  }

  auto strided_matmul_out = at::empty_strided(sizes, strides, a.options());
  strided_matmul_out = strided_matmul_out.copy_(matmul_out);
  return {strided_matmul_out};
}
return {matmul_out};
Performance Impact

The new inferOutputMetaTensor() function actually runs segments on meta device for ExprEval, which could have performance implications. The PR should include performance benchmarks to show that the overhead is acceptable and doesn't significantly impact runtime performance.

KernelArgumentHolder FusionKernelRuntime::inferOutputMetaTensor(
    HeuristicParamsList* heuristics,
    SegmentedGroup* group_to_run,
    const KernelArgumentHolder& group_runtime_inputs,
    PrecomputedValues* evaluator_precomputed_values) const {
  FUSER_PERF_SCOPE("FusionKernelRuntime::inferOutputMetaTensor");
  NVF_ERROR(heuristics != nullptr);
  Fusion* fusion_to_run = group_to_run->getFusion();
  KernelArgumentHolder group_runtime_outputs;
  const auto& heuristic_params = heuristics->at(group_to_run->groupId());
  const bool is_expr_eval =
      heuristic_params->scheduler_type == SchedulerType::ExprEval;
  if (is_expr_eval && isOptionEnabled(EnableOption::InferContiguity)) {
    // For expr evaluated fusion, the striding rules follow that of ATen.
    ExpressionEvaluator eval_fusion;
    for (auto i : arange(group_to_run->inputs().size())) {
      const auto& tensor_pv = group_runtime_inputs[i];
      if (tensor_pv.is<at::Tensor>()) {
        const auto& t = tensor_pv.as<at::Tensor>();
        if (t.defined()) {
          const auto meta_t = at::empty_strided(
              t.sizes(),
              t.strides(),
              at::TensorOptions().device(at::kMeta).dtype(t.dtype()));
          eval_fusion.bind(fusion_to_run->inputs()[i], meta_t);
        } else {
          eval_fusion.bind(fusion_to_run->inputs()[i], t);
        }
      } else {
        eval_fusion.bind(fusion_to_run->inputs()[i], tensor_pv);
      }
    }
    for (auto v : fusion_to_run->outputs()) {
      auto result = eval_fusion.evaluate(v);
      group_runtime_outputs.push(result);
    }
  } else {
    return inferContiguousOutputMetaTensor(
        fusion_to_run, group_runtime_inputs, evaluator_precomputed_values);
  }
  return group_runtime_outputs;
}
Test Coverage

The new test test_issue4888 is quite complex and comprehensive, which is good. However, it would be beneficial to add simpler, more focused tests that specifically validate the contiguity inference behavior for edge cases like slice operations that are known to produce non-contiguous tensors.

def test_issue4888(nvfuser_direct_test):
    # https://github.com/NVIDIA/Fuser/issues/4888
    def nvfuser_fusion_id2(fd: FusionDefinition) -> None:
        T0 = fd.define_tensor(
            shape=[4096, 4097],
            contiguity=[True, True],
            dtype=DataType.BFloat16,
            is_cpu=False,
            stride_order=[1, 0],
        )
        T1 = fd.define_tensor(
            shape=[4096, 4097],
            contiguity=[True, True],
            dtype=DataType.Bool,
            is_cpu=False,
            stride_order=[1, 0],
        )
        T2 = fd.define_tensor(
            shape=[4096, 4097],
            contiguity=[True, True],
            dtype=DataType.Bool,
            is_cpu=False,
            stride_order=[1, 0],
        )
        T3 = fd.define_tensor(
            shape=[1, 32, 4096, 4096],
            contiguity=[None, True, True, True],
            dtype=DataType.BFloat16,
            is_cpu=False,
            stride_order=[3, 2, 1, 0],
        )
        T4 = fd.ops.cast(T0, dtype=DataType.Float)
        T5 = fd.ops.bitwise_or(T1, T2)
        T6 = fd.ops.set(T5)
        fd.add_output(T6, T1)
        T7 = fd.ops.cast(T6, dtype=DataType.Float)
        T8 = fd.ops.mul(T4, T7)
        T9 = fd.ops.cast(T8, dtype=DataType.BFloat16)
        T10 = fd.ops.set(T9)
        fd.add_output(T10, T0)
        T15 = fd.ops.broadcast_in_dim(T10, shape=[1, 4096, 4097], broadcast_dims=[1, 2])
        T21 = fd.ops.broadcast_in_dim(
            T15, shape=[1, 1, 4096, 4097], broadcast_dims=[0, 2, 3]
        )
        T27 = fd.ops.broadcast_in_dim(
            T21, shape=[1, 1, 4096, 4097], broadcast_dims=[0, 1, 2, 3]
        )
        T43 = fd.ops.slice(
            T27,
            start_indices=[0, 0, 0, 0],
            end_indices=[1, 1, 4096, 4096],
            strides=[1, 1, 1, 1],
            manual_normalization=0,
        )
        T49 = fd.ops.broadcast_in_dim(
            T43, shape=[1, 32, 4096, 4096], broadcast_dims=[0, 1, 2, 3]
        )
        T50 = fd.ops.cast(T49, dtype=DataType.Float)
        T51 = fd.ops.cast(T3, dtype=DataType.Float)
        S52 = fd.define_scalar(0.0883883, dtype=DataType.Double)
        T53 = fd.ops.mul(T51, S52)
        T54 = fd.ops.add(T53, T50)
        T55 = fd.ops.max(T54, dims=[3], keepdim=False, dtype=DataType.Null)
        T61 = fd.ops.broadcast_in_dim(
            T55, shape=[1, 32, 4096, 1], broadcast_dims=[0, 1, 2]
        )
        T67 = fd.ops.broadcast_in_dim(
            T61, shape=[1, 32, 4096, 4096], broadcast_dims=[0, 1, 2, 3]
        )
        T68 = fd.ops.sub(T54, T67)
        T69 = fd.ops.exp(T68)
        T70 = fd.ops.sum(T69, dims=[3], keepdim=False, dtype=DataType.Null)
        T76 = fd.ops.broadcast_in_dim(
            T70, shape=[1, 32, 4096, 1], broadcast_dims=[0, 1, 2]
        )
        T82 = fd.ops.broadcast_in_dim(
            T76, shape=[1, 32, 4096, 4096], broadcast_dims=[0, 1, 2, 3]
        )
        T83 = fd.ops.reciprocal(T82)
        T84 = fd.ops.mul(T69, T83)
        T85 = fd.ops.cast(T84, dtype=DataType.BFloat16)
        fd.add_output(T49)
        fd.add_output(T84)
        fd.add_output(T85)

    inputs = [
        torch.testing.make_tensor((4096, 4097), dtype=torch.bfloat16, device="cuda:0"),
        torch.testing.make_tensor((4096, 4097), dtype=torch.bool, device="cuda:0"),
        torch.testing.make_tensor((4096, 4097), dtype=torch.bool, device="cuda:0"),
        torch.testing.make_tensor(
            (1, 32, 4096, 4096), dtype=torch.bfloat16, device="cuda:0"
        ),
    ]
    nvfuser_direct_test.exec_nvfuser(
        nvfuser_fusion_id2, inputs, enable_options=["infer_contiguity"]
    )

zasdfgbnm and others added 13 commits January 13, 2026 09:52
…andling

- Renamed `inferOutputShapeAndContiguousStrides` to `inferContiguousOutputMetaTensor` for clarity.
- Updated function signatures to remove unnecessary parameters.
- Introduced `inferOutputMetaTensor` in `FusionKernelRuntime` to handle output shape inference for segmented groups.
- Enhanced `updateWithSegmentOutputs` to streamline output management without updating contiguity directly.
- Improved overall code organization and readability.
@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm zasdfgbnm marked this pull request as ready for review January 15, 2026 18:49
@zasdfgbnm zasdfgbnm requested review from naoyam and wujingyue January 15, 2026 18:49
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 15, 2026

Greptile Summary

This PR fixes issue #4888 where FusionKernelRuntime incorrectly computed strides for intermediate tensors from ExpressionEvaluator segments by assuming all outputs are contiguous. The key changes are:

  • Added FusionKernelRuntime::inferOutputMetaTensor() which executes ExprEval segments on meta device to infer actual output shapes and strides, replacing the previous inferOutputShapeAndContiguousStrides() that assumed contiguity
  • Added FusionKernelRuntime::updateContiguityOfSegmentOutputs() to update TensorView contiguity information based on inferred tensor strides
  • Introduced EnableOption::InferContiguity feature flag to incrementally enable this behavior (currently enabled by default in test setup, but disabled for specific failing tests)
  • Modified MatmulOp::evaluate() to avoid forcing contiguous strides when InferContiguity is enabled
  • Renamed inferOutputShapeAndContiguousStrides to inferContiguousOutputMetaTensor for clarity

The implementation leverages PyTorch's meta device to compute shapes/strides without materializing actual tensors. The PR description mentions that some ATen ops' meta device implementations are Python-based and can hang when called from C++ (due to GIL acquisition issues), requiring manual shape/stride computation using at::empty_strided(device=meta) for those cases.

Confidence Score: 4/5

  • This PR is safe to merge with moderate confidence, using a feature flag for gradual rollout
  • The implementation is well-structured with a feature flag allowing incremental enablement. The core logic for meta device execution is sound, and the PR includes appropriate test coverage including the original failing case from issue FusionKernelRuntime::getMaybeHeuristicsFor computes the wrong strides.  #4888. Score is 4 (not 5) because: (1) some tests are explicitly disabled for InferContiguity, indicating known compatibility issues that need future fixes; (2) the PR description mentions potential GIL-related hangs with certain ATen ops on meta device, though mitigations are in place; (3) this is a behavioral change affecting stride computation that could have subtle effects on downstream code
  • No files require special attention - the implementation is clean and well-structured

Important Files Changed

Filename Overview
csrc/runtime/fusion_kernel_runtime.cpp Added inferOutputMetaTensor and updateContiguityOfSegmentOutputs methods to infer output shapes/strides using meta device execution for ExprEval segments, replacing inferOutputShapeAndContiguousStrides calls
csrc/runtime/fusion_kernel_runtime.h Added declarations for inferOutputMetaTensor and updateContiguityOfSegmentOutputs methods
csrc/options.h Added InferContiguity enable option to control the new contiguity inference feature
csrc/ir/composite_nodes.cpp Modified MatmulOp::evaluate to conditionally apply stride adjustment only when InferContiguity is disabled, allowing natural ATen output strides when enabled
tests/cpp/utils.cpp Enabled InferContiguity by default in test setup
tests/python/direct/test_python_frontend.py Added test_issue4888 to verify the fix for incorrect stride computation with explicit InferContiguity enablement

Sequence Diagram

sequenceDiagram
    participant User
    participant FusionKernelRuntime
    participant prepareInputs/getMaybeHeuristicsFor
    participant inferOutputMetaTensor
    participant ExpressionEvaluator
    participant ATen as ATen (Meta Device)
    participant updateContiguityOfSegmentOutputs
    participant TensorView

    User->>FusionKernelRuntime: runWithInputs(args)
    FusionKernelRuntime->>prepareInputs/getMaybeHeuristicsFor: Prepare segment inputs
    
    loop For each segment
        prepareInputs/getMaybeHeuristicsFor->>inferOutputMetaTensor: Infer output shape/stride
        
        alt is_expr_eval && InferContiguity enabled
            inferOutputMetaTensor->>ExpressionEvaluator: Create ExpressionEvaluator
            loop For each input
                inferOutputMetaTensor->>ATen: at::empty_strided(sizes, strides, device=meta)
                ATen-->>inferOutputMetaTensor: meta tensor
                inferOutputMetaTensor->>ExpressionEvaluator: bind(input, meta_tensor)
            end
            loop For each output
                ExpressionEvaluator->>ATen: evaluate() - run ATen ops on meta device
                ATen-->>ExpressionEvaluator: result meta tensor with actual strides
                ExpressionEvaluator-->>inferOutputMetaTensor: result
            end
        else not expr_eval or InferContiguity disabled
            inferOutputMetaTensor->>inferOutputMetaTensor: inferContiguousOutputMetaTensor()
            Note right of inferOutputMetaTensor: Assumes contiguous output
        end
        
        inferOutputMetaTensor-->>prepareInputs/getMaybeHeuristicsFor: group_runtime_outputs
        
        prepareInputs/getMaybeHeuristicsFor->>updateContiguityOfSegmentOutputs: Update TensorView contiguity
        
        alt InferContiguity enabled
            loop For each output TensorView
                updateContiguityOfSegmentOutputs->>TensorView: ir_utils::resetContiguityFromTensor(tv, tensor)
                Note right of TensorView: Updates contiguity info from actual tensor strides
            end
        end
        
        updateContiguityOfSegmentOutputs-->>prepareInputs/getMaybeHeuristicsFor: done
    end
    
    prepareInputs/getMaybeHeuristicsFor-->>FusionKernelRuntime: all_runtime_inputs prepared
    FusionKernelRuntime->>FusionKernelRuntime: Execute segments with correct stride info
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

auto fusion_to_run = segmented_fusion_->makeFusion(group_to_run).second;
auto group_runtime_outputs = inferOutputShapeAndContiguousStrides(
fusion_to_run.get(), group_runtime_inputs);
auto group_runtime_outputs = inferOutputMetaTensor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm losing track of the code. group_runtime_inputs contain meta tensors or real tensors at this moment? The setDeviceIndex call seems to say they are real tensors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC in prepareInputs, group_runtime_inputs contains real tensor (but still, inferOutputShapeAndContiguousStrides returns meta tensor), but in getMaybeHeuristicsFor, group_runtime_inputs contains meta tensor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Should setDeviceIndex at line 419 be removed? Is it safe or necessary? (I don't think your PR changes the situation; just OOC).

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm zasdfgbnm requested a review from wujingyue January 15, 2026 22:56
@zasdfgbnm
Copy link
Collaborator Author

!test

2 similar comments
@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

@zasdfgbnm
Copy link
Collaborator Author

!test

1 similar comment
@zasdfgbnm
Copy link
Collaborator Author

!test

auto fusion_to_run = segmented_fusion_->makeFusion(group_to_run).second;
auto group_runtime_outputs = inferOutputShapeAndContiguousStrides(
fusion_to_run.get(), group_runtime_inputs);
auto group_runtime_outputs = inferOutputMetaTensor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Should setDeviceIndex at line 419 be removed? Is it safe or necessary? (I don't think your PR changes the situation; just OOC).

args_manager.updateWithSegmentOutputs(
group_to_run->outputs(), group_runtime_outputs, run_order_id);

updateContiguityOfSegmentOutputs(group_to_run, group_runtime_outputs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to hide some bugs in mark_aliases_prepare or allocation_order_inference? The TensorViews in the complete fusion and therefore in segments ought to be correct after preseg.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you define "hide a bug"? We need the correct continuity eventually, which is only possible after we know the scheduler of segmentation. So, why isn't this just writing the correct information, instead of hiding a bug?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is only possible after we know the scheduler of segmentation

But scheduling happens after prepareInputs:

compileKernel(group_runtime_inputs, group_to_run);

I'm probably missing some important details that are so obvious to you. Let me try to remove this line and see where things break...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ _bn && pytest tests/python/direct/test_python_frontend.py -k test_issue4888 -vs 

passes with the following patch

diff --git a/csrc/runtime/fusion_kernel_runtime.cpp b/csrc/runtime/fusion_kernel_runtime.cpp
index e025d29d..132cba82 100644
--- a/csrc/runtime/fusion_kernel_runtime.cpp
+++ b/csrc/runtime/fusion_kernel_runtime.cpp
@@ -427,8 +427,6 @@ std::vector<KernelArgumentHolder> FusionKernelRuntime::prepareInputs(
     // map output args to tensor map
     args_manager.updateWithSegmentOutputs(
         group_to_run->outputs(), group_runtime_outputs, run_order_id);
-
-    updateContiguityOfSegmentOutputs(group_to_run, group_runtime_outputs);
   }
 
   return all_runtime_inputs;

But let me try other tests as well...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the other call to updateContiguityOfSegmentOutputs. After removing that, I see SegmentationTest.RevertPrivatizedUpcast fails. Let me try to understand the error...

$ bin/test_nvfuser --gtest_filter=SegmentationTest.RevertPrivatizedUpcast 
Running main() from /opt/pytorch/nvfuser/third_party/googletest/googletest/src/gtest_main.cc
Note: Google Test filter = SegmentationTest.RevertPrivatizedUpcast
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SegmentationTest
[ RUN      ] SegmentationTest.RevertPrivatizedUpcast
/opt/pytorch/nvfuser/tests/cpp/test_segmentation.cpp:855: Failure
Expected equality of these values:
  num_upcast_ops
    Which is: 1
  2

To reproduce: NVFUSER_TEST_RANDOM_SEED=1768609993 NVFUSER_TEST_ATEN_RANDOM_SEED=0 test_nvfuser --gtest_filter='SegmentationTest.RevertPrivatizedUpcast'
[  FAILED  ] SegmentationTest.RevertPrivatizedUpcast (218 ms)
[----------] 1 test from SegmentationTest (218 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (218 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SegmentationTest.RevertPrivatizedUpcast

 1 FAILED TEST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants